DOCUliENT RESUhiE 



ED 352 007 



IR 015 836 



AUTHOR 
TITLE 



INSTITUTION 

SPONS AGENCY 

REPORT NO 
PUB DATE 
CONTRACT 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Kieras, David E. 

Semantics-Based Reference Resolution in Technical 
Text Processing: An Exploration of Using the WordNet 
Database in the Computerized Comprehensibi lity 
System. 

Michigan Univ., Ann Arbor. Technical Information 
Design and Analysis Lab. 

Office of Naval Research, Arlington, VA. Cognitive 

and Neural Sciences Div. 

TR~92/ONR~35 

30 Aug 92 

N0001A'-88-K-0133 

15p. 

Reports ~ Research/Technical (1A3) 
MFOl/PCOl Plus Postage. 

Artificial Intell igence; ^Computer Software 
Development; ''^Computer System Design; Databases; 
Editing; Expert Systems ; Proofreading; ^Technical 
Writing; ''^Word Processing; Writing (Composition) 
''^Automated Copy Editing; Machine Learning; *Word Net 
Database 



ABSTRACT 

The Computerized Comprehensibi lity System (CCS) 
provides an automated copy editing function, generating a mark-up of 
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material by tracking which objects are mentioned in the passage. A 
common compr ehens ibi 1 i ty problem is that the text mentions a new 
object using the syntactic structures appropriate for an already 
known object. If the reader must make an inference that presence of 
the new object is implied by the earlier mentioned object, the result 
is a potential break in the coherence of the text. CCS criticizes all 
such coherence breaks. However, many such inferences are actually 
easy for most readers, since only general knowledge is required to 
make the inference, rather than specialized knowledge about the 
domain. If so, then the CCS criticism of a coherence break is a false 
alarm. This report describes exploratory work with an augmf ited form 
of CCS, in which the WordNet database is used as a source of general 
knowledge to allow CCS to make the same kind of general knowledge 
inferences that human readers do to overcome coherence breaks. 
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Semantics-Based Reference Resolution 
in Technical Text Processing: 
An Exploration of Using tlie WordNet Database in the 
Computerized Comprehensibility System 

David E. Kieras 
University of Michigan 

Abstract 

^ The Computerized Comprehensibility System (CCS) provides an automated copy editing function, gene-^uiing a 
"maric-up" of a draft of a technical document by simulating the simpler compehension processes of a humai reader, 
and then critici2dng the text when these simple processes cannot successfully comprehend the material. A kt ' CCS 
function is criticizing the coherence of the material by tracking which objects are mentioned in the passage, A 
common comprehensibility problem is that the text mentions a new object using the syntactic structures appropriate 
for an already-known object If the reader must make an inference that {»^nce of the new object is implied by 
earlier-mentioned object, the result is a potential break in the coherence of the text CCS criticizes all such coherence 
breaks. However, many such inferences arc actually easy for most readers, since only gcieral knowledge is required 
to make the inference, ratlier than specialized knowledge about the domain. If so, then the CCS criticism of a 
coherence break is a false alarm. This report describes exploratory work with an augmented fonn of CCS, in which 
the WordNet database is used as a source of general knowledge to allow CCS to make the same kind of general 
knowledge inferences that human readers do to overcome coherence breaks. 



Introductton 

This report describes some results obtained by extending the Computeriized Comprehensibility System (CCS) 
described in Kieras (1989, 1990) to make use of the semantic lexicon database developed by Miller and his 
coworkers (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990), called WordNet, CCS is a system which provides 
an automated copy editing function , by generating a "mark-up" of a draft of a technical document It has been more 
completely described elsewhere; here only the basic functions will be summarized Figure 1 shows the overall 
structure of the CCS system. CCS attempts to do a full grammatical parse of tlie sentence structure, followed by an 
attempt to perform simple reference resolution on each noun phrase. Finally, it integrates the sentence content into a 
representation of the content of the passage as a whole, A set of criticism rules can comment on poor grammatical 
structure, inconsistent terminology, and lack of coherence of each sentence with the rest of the passage. As described 
elsewhere, die advantage of such a system relative to conventional computer-based writing aids is that because it 
acmally attempts K) mimic the simpler comprehension processes of a human reader, it can be sensitive to when the 
writer has made too ir^s^py comprehension demands upon the reader. For example, if CCS can not resolve a reference, 
then the writer has apparently expected the reader to perform an inference in order to comprehend the sentence in the 
context of the rest of the passage. 



Simple Reference Resolution in CCS 

CCS represents the contents of a passage using a propositional semantic network, based on Anderson's ACT 
representauon (1976), Along the lines of the given-new distinction (Haviland & Clark, 1974, Clark & Haviland, 
1977), CCS attempts to identify the given, or already known, item in a sentence, and then adds the new information 
m the sentence to the representation. Thus each noun phrase is matched against the rei^esenlation of the previous 
sentences m the passage in order to identify which referent is being referred to. This matching can sometimes be 
done simply on the basis of the word strings involved, but more generally, it must be done in terms of the 
propositional representation specified by the noun phrases and passage content Complex noun phrases such as the 
bearings that the oil that the pumps circulates lubricates are matched recursively; the most interior noun phrase is 
matched and the results are then used in an auempt to match the next most outermost noun phrase. This process is 
called simple reference resolution because the processing is done strictly in terms of the immediate surface and 
propositional content of the passage; no semantic knowledge about the word meanings, or general knowledge about 
the wond, is used m this process. 
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Figure 1. Structure of the Computerized Comprehensibility System (CCS). The wc^k in this report concerns 
the Reference Resolution Module. 

For example, in the simple passage shown in Tkble 1, the title introduces the main lube oil system as the main 
topic of the passage. The first sentence of the passage refers directly to the system with the identical set of words, 
main lube oil system. However, the second sentence refers to the lube oil system which is similar, but not identical, 
to the phrase main lube oil system, but refers to the same object, the system. The required matching is more complex 
than simply matchmg words; for example, the second sentence also refers to oil in the phrase lubricating oil Even 
though the word oil has afjpeared previously, this referent, the lubricating oil, has not previously appeared 
Moreover, the third sentence refers to the oil that the pump circulates. This oil is the same referent as the 
lubricating o// mentioned in the third sentence, and must be recognized as such, even though the form of the noun 
phrase is completely different — the third sentence describes the oil is in terms of being circulated by the pump but 
this descnpuon was not a previous noun phrase, but was the main proposition of the second sentence. In 
processing \hi^ passage, CCS isolates each individual noun phrase and attempts to match it against previously 
menuoned items m the passage. Thus, even simple reference resolution can be complex. 



Table I 

An example passage used to demonstrate CCS functions 



MAIN LUBE OIL SYSTEM 

I^fsl^f '"tu^®. oil system consists of a main lube oil pump, an auxiliary lube oil pump, and a duplex lube oil 
strainer. The function of the lube oil system is to circulate lubricating oii to the turbine and gearbox gears 
l-or example, the bearings that the oil that the pump circulates lubricates support the turbine rotor The dutv 
officer IS responsible for observing the pressure gauges. The lube oil system is critical for operation of the 



CCS has the ability to match a reference to a referent representation based on whether any proper subset of the 
previous predicates is mentioned in the to-be-matched noun phrase. Thus the main system could be matched against 
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the main lube oil system. The restriction is that the head noun, e.g.. system, mast be idenucal. AWiough the CCS 
software had provision for distinguishing between words and the denoted concept, no attempt made to rep^sent 
which words designated the same concept; CCS did not recognize synonyms for words. Thus the plane would not 
match a previously mentioned airplane. Likewise, CCS had no semantic mformauon, for example that airplanes had 
wings, that airplanes flew, or that airplanes are a member of the more general class of aircraft 

A basic aimmient supporting this rather severe limitation on a comprehension system was that an automated copy 
editing system would not be practical if it had to be stocked with detailed domain knowledge before it could be 
used. Rather CCS was defined without such knowledge in an effort to sec whether a useful editonal tool codd be 
obtained without any domain knowledge. That is, many of the known problems with the comprehensibihty of text 
are problems at the level of internal sentence structure or internal textual structure. For example, a long-stated 
problem in technical documentation is inconsistent terminology. Consistent terminology would be clwactenzed by 
an identical or near identical use of the same string of words to refer to each individual object Identifying this 
pit*lem can be done without domain knowlcd^je. In addition, the typical user of a technical document can not be 
relied upon to have much domain knowledge; clearly if they had considerable domain knowledge it is unlikely that 
they would be referring to the document at all. All of these consideradons led to the initial decision to develop 
CCS without any domain knowledge. 

However, CCS pays considerable attention to a serious form of incoherence, in which there is no easy-to-determine 
relationship between the sentence and the previous content of the passage, because there are apparendy no shared 
liferents. When this break in the coherence of the passage occurs, there is often a rather severe demand on the 
reader's inference-making abilities. For example, the TEible 1 passage mentions die duty officer and \hc pressure 
gauges at the end of the first paragr^h. Presumably if the read^ is a sailor in the U.S. Navy, he or she wUl 
probably know what (he duty officer is. However, unless they have domain expertise, they certainly will not realize 
what the pressure gauges are. Thus an important function of CCS is to point out where there is are such failures of 
coherence, and which of die referents have not prev:(^ly appeared in the passage. 



The Problem 

The problem is that in many cases coherence failures are not useful criticisms of Uie material because it is 
reasonable to assume that every reader can easily make the required inference. For example, as sho .n in die CCS 
output excerpt in Table 2, CCS comments that die second sentence, referring to the wings, has no relationship widi 
die previous material about an aircraft, and diat the wings in die second sentence is a questionable new referent 
which is defined as a definite noun phrase diat refers to a textually new reference. That is, it is referred to as if die 
reader should know about it, being a definite noun phrase, but since this is die ftfst mention, it must be a new 
referent Thus, for example, using the wings at diis point in die passage is incorrect; die readers should not be cued 
diat diey already know about an object diat in fact diey haven't yet seen. Avoiding diis criticism would require 
rewriting die second sentence to intnxluce die wings as a new object, such as in an indefmite noun phrase in die 
sentence predicate, such as The aircraft has wings that are in a swept-back configuration. 

Table! 

Excerpt from CCS output with input sentences shown in bol^ace 



Tho F-IG aircraft Is a hlgh-performanco fighter. 

The wings have a swept-back configuration. 

The main proposition of this sentence is PR0P9: 
- REF3 WIN6S has relation HAVE 
to REF4 (SWEPT-BACK CONFIGURATION). 

NO-KNOWN-REFERENTS 

This sentence does not appear to refer to anything previously mentioned, 
and so readers may not understand how it relates to the rest of the material. 
Be sure that the sentence directly and clearly refers to a previous item. 

QUESTIONABLE-NEW-REFERENT 

These items were referred to as if the reader already knows about them, 
but they could not be matched with something previously introduced: 
REF3 WINGS 

Check: Can your reader easily figure out what you are referring to? 




HowevCT, cveo'iwdy knows that airplanes have wings. It would certainly be desiiBble if CCS was "smart" enough 
^«^c* ^'^!u°^- ^<^^}^^: writer with criticisms of incoherence in such obvious circumstances. In other 

SL in rSlShtT'^''^ CCS knowledge of every specLQc domain that technical materia might 

be prepared in, CCS might be much more useful if it made use of general knowledge to understand references- CCS 
h?SoJJS? i^fc'-en^s that aU readers would do. and thus not Criticize the ^u?oSe^^^ 



Goal of this Work 



int^^f^ ^^^^ ^ "^JV' °f knowledge would be gigantic, and thus impractical to 
mwrporate m any real system. However, U can be argued that the general kSowled^e lequi^ to iSJe many kinds 
of reference IS m fact very Imiited, consisting of such simple semantic relationships as^^wholeS subS 
superset Fbr example, the coherence inference in the pas^ge shown in Table 2 cSuld HeaU withby sSv 
applymg the fact that airplanes have wings. Other c^, such as referring to an obLVby its sZerS^: X Lassie 

W)rdNet Projec (Miller, et al., 1990), sponsored by the Office of Naval Research, h^ produced rSSiUclldcon 
L'Sf S.^ ?7 e^^"?'^'' ^^'^ ^ S^"^ together into sy^nyEs S3 vSs 

^nLfT"^^ relationships have been specified between those clasL. For exa^nple.^the rclaS pJuiir 
air/7/an<j/wvewHi^5 and a co//zew a do« are represented in this database. cwuuiiMups uui 

This work was undertaken to determine whether such a database could be used to effectively improve the aualitv 
iSnS''S^T^S^nr^^l'°''''^ produce. This work is just preliminary, and so if na rfmiuVeSe 
KiTo »f ^ problems of such an approach. However, it does give some initial indications of what problems 
• ° ^ ^ effectively make use of such a geneW semantic lexicon in the context of J'^^^^P™"'"'"'' 
comprehension model, or a text critiquing system, such as CCS. ^"hicai ui a lexi 

In the remmnder of this report, the key technical features of the work will be summarized- this consists first of a 
Sru8rpSi5.S CCS wn'nSd was integmted into CCS, and how the ku^eSSfSS ' 

EauiKrr^ SS ro^^Si ^ pr&M a summary of some results where criticisms produced by 

txiis augmented CCS are compared with those produced by the original version of CCS. Finally some conclusions 
and some suggesuons for future work will be stated. rmaiiy, some conclusions 

Method 

Simplification Approach 

SSS i,^?w^S5?"A'?^^ mcchanams. A further simpUficalior was that taslead of attempting tTiS^Se S 
Database Format 
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words that can be used to refer to that concept, an'i pointers to related concepts, such as supersets, subsets, parts, or 
wholes. All of the simple semantic relationships in WordNet were included in this representation » but only the 
subset/superset and part> whole relations were used in this work. 

Tiible 3 gives an example of a few entries in this representation. The file containing these entries can be simply 
read by a LISP program which automatically represents each word or concept as a symbol, by virtue of LISP's built- 
in mechanisms, and the relations between these concepts and words can be represented as properties and attributes 
using lisp's property-list feature. Tbgether with LISPs built-in symbol referencing system, this aR)roach provides 
direct access from one point in the semantic network to another simply by using the GET function in LISP. Thus 
the expression (GET ''^N-AIRPLAhE '«) returns the symbol '^N- AIRCRAFT. 



Tables 

Sample of reformatted WordNet database 



('^N-AIRC- AFT (AIRCRAFT) <M "N-FLEET >P "N-SKELETONa >> "N-HELICOPTER >> "N-GLIDER 

>P "N-FUjiL_GAUGE >> "N-DR0NE3 >P "N~CABIN2 >P "N-C0CKPIT2 >> "N-LIGHTER-THAN-AIR_CRAFT 

>> "N-AIRPLANE >P "N-AIRCRAFT_ENGINE << "N-VEHICLE) 

("N-AIRCRAFT_CARRIER (AIRCRAFT-CARRIER CARRIER FLATTOP ATTACK__AIRCRAFT_CARRIER) 
>P '^N-FLIGHT_DECK >P "N-ARRESTER << "N-WARSHIP) 

('"N-AIRCRAFT^ENGINE ( AIRCRAFT^ENGINE) <P "N-AIRCRAFT << "N-ENGINE2) 

('"N-AIRDOCK (AIRDOCK HANGAR REPAIR_SHED SHED) <P '^N-AIRPORT << ^N-BUILDING3) 

('"N-AIR^FILTER (AIR_FILTER) >> "N-FILTER_TIP <P "N-VENTILATOR << "N-FILTER2) 

C^N-AIRFOIL (AIRFOIL AEROFOIL) » "N-WING6 >> "N-VERTICAL__TAI L >> "N-STABILIZER 

>> -"N-RUDDER >> "N-R0T0R_BLADE >> "N-FLAP5 >> "N-ELEVATOR >> "N-H0RIZ0NTAL_STABILI2ER 

>> -"N-AILERON « '^N-DEVICE2) 

('"N-AIR^HAMMER (AIR-HAMMER JACKHAMMER PNEUMATIC_HAMMER) << "N-HAMMER5) 
C'N-AIR^HOLE (AIR_H0LE) << "N-H0LE8) 

C^N-AIR-INTAKE (AIR-INTAKE) <P "N-CARBURETOR « "N-DUCT2) 
('"N-AIRLINEa (AIRLINE) « "N-TRANSP0RTATI0N_SYSTEM) 
("N-AIRLINE (AIRLINE) << "N-H0SE3) 

("N-AIRLINER (AIRLINER) >P "N-SEAT5 >P "N-GALLEY << '^N-AIRPLANE) 
("N-AIRLOCK (AIRLOCK AIR_L0CK) « "N-CHAMBER2) 

('"N-AIR^PASSAGE (AIR_PASSAGE AIR_DUCT AIRWAY) >P '"N-VENT2 >> '"N-UPCAST >> '"N-SN0RKEL2 
>> "N-DOWNCAST << "N-DUCT2) 

C^N-AIRPLANE (AIRPLANE AEROPLANE PLANE) >P -^N-WINGG >P '"N-WINDSHIELD >> -^N-TURBOJET 
>> "N-SEAPLANE >P "N-RADOME >> "N-PROPELLER_PLANE >P "N-P0D2 >> "N-MONOPLANE >P "N- 
LANDING_GEAR >> -^N-JETS >P -^N-FUSELAGE >> '"N-FIGHTER4 >P '"N-ESCAPE_HATCH >P '"N-COWL 
>> "N-BOMBER >> "N-BIPLANE >> "N-AMPHIBIAN >> "N-AIRLINER « "N-AIRCRAFT) 

("N-AIRPLANE_PROPELLER (AIRPLANE PROPELLER AIRSCREW PROP) <P "N-PROPELLER PLANE 
<< "N-PROPELLER) 

("N-AIRFIELD (AIRFIELD LANDING_F lELD) >P "N-TAXIWAY >P "N-RUNWAY >> "N-AUXILIARY_AIRFIELD 
>P '^N-APR0N2 >> '"N-AIRSTRIP >> '"N-AIRPORT <P '^N-TRANSPORTATION^SYSTEM « '"N~FACILITY5) 



Key: Each entry is of the form: 

(<concept> <list of synonyms for the concepu> <scmantic relation> < related concepu> ...) 
Concept labels are prefixed by "'^N-". The relations are: 

«/ » = subset/superseu <P/>P = pait-of/has-part, <M/>M = member-of/has-memben 
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The complete noun database was reduced to correspond to the CCS lexicon with a program that noted which 
concepts in the semantic database corresponded to words in the lexicon, and then incorporating ail of the semantic 
relationships and concepts needed to connect the lexicon words together. For example, device and wing would be 
related by a set of intervening set and and part relations. The intervening concepts for the subsets and parts were 
included in the reduced semantic database. The result was a semantic representation that included all of the semantic 
mfomiation available about the words in the lexicon, and also had considerably more concepts and woids, namely 
those thai related the lexicon words togethen 



Scmantkfr-Based Reference Resolution Mechanisms 



As mentioned before, the simple reference resolution process in CCS attempts to relate a noun phrase back to the 
previously introduced items in the passage. Note that indefinite noun phrases, such as a magnetron, actually 
introduce a new referent, and the reader is not namally expected to attempt to identify this item with a previously 
menuoned item. Thus CCS only attempt to resolve definite noun phrases (those starting with the article the) 
b^usc these are normally a textual instruction to the reader to attempt to make such a connection. In Uie augmented 
CCS, the standard simple reference resolution process was first attempted, and then any definite noun phrases that 
remained unresolved (Uie questionable new references) were subject to a semantic-based search. 

The basic strategy of the semantics-based reference resolution was to fmd a connection through Uie semantic 
rdationships between the unresolved definite noun phrase and some oUier item afready mentioned in Uie passage. 
This process ^ntially simulated a spreading activation search Um3ugh Uie semantic networic. Firet, Uie semantic 
relations attached to Uie head noun of Uie unresolved noun phrase were examined, and the associates* -oncepts 
retrieved and put into woricing memory. It was found necessary to set an arbitrary limit of 100 such retrievals in 
order to stop Uie system from getting lost in futile searches. Then a test was performed to determine wheUier any of 
those concepts were appropriately related to Uie head nouns of previously mentioned items in Ur passage. If not Uie 
semanuc relationships between Uie last set of concepts retrieved for Uie unresolved noun would Uien followed and 
and a new set of concepts placed in working memory, and Uie test repeated. If Uie concepts were appropriately 
related, Uien Uie new reference was designated as a resolved reference, and a proposition added to Uie passage 
representauon to show Uie relationship between Uiis implied referent and Uie previously existing referents. 

A raUier drastic simplification was made; only Uie head noun information in boUi Uie unresolved noun phrase and 
previously njcnuoned noun phrases was used in Uie reference resolution. Because Uie modifier in Uie noun phrases 
were Ignored, Uiis simplification turned out to produce a great many false results, as will be described below. 
However for purposes of testing Uie approach, this provides a very liberal test in Uiat it allows Uie system to make 
use of any relationship found Uirough Uie semantic network, regardless of wheUier Uie relationship is actually Uie 
correct one. ^ ^ 

There were Uirce semantic relationships tested for in Uie reference resolution process. If Uiese relationships were 
found, Uien Uie new referent could be taken as implied by a previously mentioned referent. In Uie same-concept 
relationship, Uie head noun of a new referent refers to a concept Uiat a previous head noun also refers to, and Uius Uie 
previous referent unplies Uie new referent. This computation allows reference to an item by a synonym, but since 
only Uie head noun was used, many false resolutions resulted. 

The second type of relationship, implied subl superset, involved chaining Uu-ough superset or subset relations so 
Uiat a previous item could be referred to in Uie new referent noun phrase by eiUier a superset concept or a <=ubset 
concept Note that normally, referring by a superset is well-defined, as in Lassie is a collie. The dog is brave, in 
which di?^ designates a superset of But reference by Uie subset is logically questionable; for example. 
Hover is a dog ^The dachshund is fat. is unacceptable, because Uie reference the dachshund is not an accepted way 
to rcter to Uie o{dog\ m fact, Uiis usage is a way to convey new information in certain settings (see Haviland 
& Clark 1974; Clark & HavUand, 1977, for more discussion). But in most situation, it should probably be 
expressed as Rover is a dog. The dog, which is a dachshund, is fat. However, in military text, Uiere appear to be 
many cases where reference by subset appears, as in The T-38 is a supersonic aircraft. The fighter .... ih which 
aircraft is technically a superset (once or twice removed) of fighter. WhQe Uiis might again simply a device to 
convey new uiformauon, it must followed set relationships. Since in Uiis exploration it was desirable to give CCS 
every opportumty to resolve references, boUi reference by subset and supei'set was allowed. 

The third relationship, implied part, was part-whole relations. wiUi possible intervening subset- supereet relations, 
it Uie new referent was a subpart of a previously menuoned item, or a part of a subset or superset of a previous item 
^en It was accepted as an implied referent. For example if an aircraft had been mentioned, and Uie unresolved 
reterence was the propeller, Uie resulung relationship would be Uiat a propeller is part of an airplane and airplanes 
arc a subset ot aircraft. Thus mixtures of set relationships and part-whole relationships were accepted as implying a 
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the existence of the new referent 

It should be noted that there is no attempt to guide the search by the general context of the discussion, meaning 
thl coni^uKwSnYhc i^-v m previous ones is accepted. Thus, for example even m the context of 
TSS!fiXer SEns^ S boxer Zy be considered, and mrght even be the idenufied connecUon rt the 
passage contained any related referents, such as combatant. 

CCS was ausmented with the semanUc network and additional producUon rules to pcrfq™ the semanUcs-ba^jd 
referenw3uoS Some ex^ output from Uie augmented CCS is shown excerpted m Table 4 The rdevant 
sJmSSTown S^^^ Fig*ure 2. NoUce that the references to parts, supersets, and subset of the 
initially mentioned aircraft are successfully resolved. 



^cerpt from CCS output illustrating implied reference processing: input sentences in boldface 

The F-1 Is an aircraft. 
The wing is long. 

Assur^nathaUhese newly introduced items are part of previously mentioned ?ems: 
NewSa WING (coi^^^ ^N-WING6) is part of REF2 AIRCRAFT (concept: ^N-AIRCRAFT) 
Check: Is this con-ect? 

The flaps are big. 
IMPLIED-PART 

NSSfFjEVs^FUPslconcept: -N-FLAP5) is part^of REF3 WING (concept: -N-W1NG6) 
The airplane Is expensive. 

AssuiSngSfl^ese newly introduced items referto previously mentioned items: 

New REF6 AIRPLANE (concept: ^N-AIRPLANE) is included by, and refers to, REF2 AIRCRAFT 

(concept: '^N-AIRCRAFT) 

Check: Is this correct? 

The lighter Is essential. 

IMPLIED-SUBSET 

New^REVy^FiaHTER (concept: ^N-FIGHTER4) is included by, and refers to, REF6 AIRPLANE 
(concept: ^N-AIRPLANE) 

The vehicle has wheels. 

IMPLIED-SUPERSET , . 

Assuming that these newly introduced items referto previously f^en^'oned items 
New REF8 VEHICLE (concept: '^N-VEHICLE) includes and refers to REF2 AIRCRAFT 
(concept: ^N-AIRCRAFT) 
Check: Is this con-ect? 
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VEHICLE 

vehicle 

word AIRCRAFT 



FLEET 

/ 

in*mb«r 



part 



•ubawt 




word 



C0CKPIT2 SKELETONS 
LIGHTER-THAN- 



p'ane a//p/a/7a aeroplane 




COWL WiNG6 
/ I 

word / \ 

wing wings 




•UbMt 



FIGHTER4 JETS PROPELLER^PLAN^AIRLINER 

\ ^ B0XER2 
word ^ 

fighter 



RIB3 FUP5 




fet °f ' °f '^'^ semantic net relevant to the "mble 4 example built from the 

Results 

Um^uS'd^t^El^^^^^^ opctes ccecUy. within its 

w.e.^te..at.e.wouir^^^^ 

copaS^^^^^ number of quesUonable new referent was 

mechanisms, and so look as if the reader is exS to STe?n Hn„S ^ '■^•'P'^^ simple reference 

appear. Forsoineof the^ cS the h^S£?e J^"'!^^^ ^-'"^ ^'"^ "^"^ earlier in the passage 
ofth.") f^'*T ^.^"^^^0"ablereference^1^iven SeS if^^ ider.Ufying a^ 

of what type of relation was involved, and whX?SSSLin^ ^^^^ ^^re tabulated in temis 

re^uonship expressed a reasonable s^manrrSion L^w^^^^^^^^^ ^T"^^ ^''^ther the idenUfied 
references in passage sentences that could not £ SSot '^^ P'^^^^S ^ 

quSSbtnTw'S^^^^^^^^ TT^egeneraln^sultisthatmanyofthe 
■dentafted relaUonships were incoSK t^^'Sc'^S'^Zu^^^^r^:^ ~ 
A Navy Rate TVaining Manual Excerpt 



11 



principles or physical theory; it is mosUy a dcscripdon of tlie typical system structure associated with air ejectors, 
litble S summarizes the results, 

The few cases of conwt semanUc rclaUons appear to be just fortuitously corrccL Ttie incorrect relations found are 
oftcnduVtSy maSg on the head noun. But note that while this simplification produced many false alarms, it 
S fS to Se many hits. Using context would have blocked some of the false results, but would no have 
SSi^cdaS more S^The coverage of the database is apparenUy rather spotty, contaimng word usages that are 
unusual in a technical context 



A Historical Text 

Air war is a 48-scntence passage prepared in collaboration w^th Bruce Britlon for anc^aluauo^^^^ (in 
proSess) of CCS, and is ba^ on one prepared by Brition for recall studies. It had been modified from Bntton s 
SriSbo t^^^^ of the sentences could be parsed corrccUy enough for CCS to produce finable ^ndcisms. /^^ 
War is a discussion of the Johnson Administration's Vietnam War pohcy of bombing North Vietnam, ft is written in 
a somewhat formal style, with the subject matter being historical, and concerned with pobcy and administrauve 
diSSJS^Iter^^^ puitJly technical content It was chosen for this study after the purely technical jmsages 
JSSeSfew correct impUcd reference soluUons; perhaps the WordNet database does no^^^^^^^ 
Content and so the less specialized subject matter of Air War might engage more of the WordNet database. Table 6 
summarizes the results. 

The Uterary style of this passage apparenUy produced a lot of variation in reference fonrns, which the same;Concept 
mechanism was able to compensate for through its simple-minded matching only on the head noun While itwas 
often correct this was in fact due to a relatively small numbci of distinct words being ased with different modifwre^^ 
Certain very vague and abstract words, such as significance and ser^e. produced many false connections, and some ot 
the identified relaUons were especially out of context. Despite the relatively nontechnical content ot this passage, 
many anpropriate connecUons were not available. For example, the topic of war certannly implies the enemy in the 
sentence ..co«W not defeat the enemy in the field, which could not be resolved. The problem is tt^zi the relauon 
between war and enemy is not categorizable in terms of the simple semantic relations; some tnore complex 
relationship, such as aciion^participant would be required. Of course, not all references could be resolved on 
semantic infonnation, such as the clearly "episodic" knowledge required to resolve the lonkm Gulf incident in the 
context of The Vieinam War. 

A Pilot's Flight Manual 

T'38 FliRht Control is the text from about 3 pages of the T38 FUght Manual (essentially the "owner's manual" 
for the T-38 supersonic trainer aircraft). It is typical technical descriptive text and dcscnbes the night control 
surfaces of the airplane and their associated cockpit controls. It was modified very sUghUy to mcrease the number ot 
sentences that would parse successfully by correcting some very idiosyncratic sentence structures, and it was given 
an overall tide of 7-38 Airplane because the original excerpt made no mention of an airplane bemg involved until 
very late in the passage, which severely limited the implied reference searches. Since the same-concept relauon did 
not work very welU it was disabled in the test described here. Table 7 summarizes the results. 

This passage contained a few cases which were textbook examples of implied references, due to the database 
having exacUy the required relations. However, the database also failed to include some lower-level informauon, 
such as the above-mendoned fact that throtUes have quadrants, but even less specialized concepts that switches and 
conttols have positions. 

Overall Results 

Table 8 totals the statistics across the three passages. Using the semantic reladons allowed about half of the 
questionable references to be resolved, but roughly only a fifth of the relations were reasonably correct, and most ot 
Uiese were due to one passage. Air War in which varied forms with die same head noun were used in a way Uiat 
would not be applicable to most technical text. The many incorrect relations could be suppressed by a more 
sophisticated approach to searching and matching implied references, but it is disappomung Uiat Uiere were not more 
correct relations found. This can be attributed to die fact diat Uie semantic net containmg the subset of die 
>W)rdNet database that was related to die CCS lexicon, was not rich enough (at least in die abridged version used 
here). For example, many of die required part-whole relationships were not present in die techmcal passages, and Uie 
less specialized knowledge involved in Air War was also not present 
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Tables 



Results for Ejectnrs passage 



Ibtal number of sentences and headings: 
Sentences and headings parsed: 
Referents constructed: 
Questionable references: 
Questionable references not resolved; 
References resolved via semantic relations: 
Correct relations identified: 
Incorrect relations identified: 



48 
33 
166 
35 
12 
23 
4 
19 



Samc^cept relation 
Correct: 3 cases 

Example: the most commonly used air ejector is the same as the air ejector introduced in the first sentence of 
the passage, The air ejector removes air and noncondensable gases from the condenser 

Incorrect: 11 cases 

Example: the gland exhaust condenser is not the same as the condenser introduced in the first sentence of the 
passage. 

Part-whole and sub/supers et relations 
Correct: 1 case 

Example: the steam in the sentence Figure 6-13 shows the flow of the steam, air, and noncondensabie gases 
in one type of air ejector unit, is correctly associated as a subset of substance in the previous sentence The 
flow of a substance from a higher pressure area ... . 

Incorrect: 8 cases 

Example: the valve in the sentence When you open the make-up feed valve was incorrectly identified as an 
electrical component (via the British valve = vacuum Uibe) that is related to the previous condenser, which is an 
obsolete synonym for capacitor, an electrical component 



Tabled 

Results for Air War passage 



Total number of sentences and headings: 48 

Sentences and headings parsed: 48 

Rel erents constructed: 313 

Questionable references: 7 1 

Questionable references not resolved: 44 

References resolved via semantic relations: 27 

Correct relations identified: 10 

Incorrect relations identified: 17 

Same-concept relation 
Correct: 10 cases 

Example: The primary objective was correctly identified with the objective in the previous ...serious 
differences arose over both the objective and the methods to be used. 

Incorrect: 4 cases 

Example: The beginning in From the beginning, Rolling Thunder was hedged with restrictions,,, was 
idenufied with the source in ,.Banoi as the source of the continuing problem in the south. 

Part-whole a nd sub/superset relation?^ 
Correct: 0 cases 
Incorrect: 13 cases 

Example: The face in ...would not risk its fragile and limited industrial base in the face of overwhelming 
American power was interpreted as a part of the human body, and was circuitously associated with the 
extension in .„over the extension of the war... as being a body part 
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Table? 

^^sults for T-38 Flight Control passage 



Total number of sentences and headings: 
Sentences and headings parsed: 
Referents constructed: 
Questionable references: 
Questionable references not resolved: 
References resolved via semantic relations: 
Correct relations identified: 
Incorrect relations identified: 

Pait-whole and sub/superset relations 
Correct: 3 cases 

Example:The flaps in the sentence The wing flaps are electrically controlled by a flap lever are recognized as a 
part implied by the object mentioned in the title: T-38 Airplane 

Incorrect 21 cases 

Example: The quadrant in The wing flap lever is located on the throttle quadrant of each cockpit, is not 
recognized as part of the equipment in an airplane cockpit, but is incorrectly associated via the concept measure 
with the amount in the previous sentence ... by increasing the amount of horizontal tail deflection ... . 



91 
67 
287 
41 
17 
24 
3 

21 



Tables 

Overall results totaled across the three test passages 



Total number of sentences and headings: 187 

Sentences and headings parsed: 148 

Referents constructed: 767 

Questionable references: 147 

Questionable references not resolved: 73 

References resolved via semantic relations: 74 

Correct relations identified: 17 

Incorrect rt itions identified: 57 



Same-concept relation (two passages only) 
Correct: 13 cases 
Incorrect: 15 cases 

Part-whole and sub/superset relations 
Correct: 4 cases 
Incorrect: 42 cases 
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Conclusions 



This work shows that a natural-language processing system such as CCS can make «se ofa semanUc dambase such 
as VVortE However in the test ca^s l^re, a broad. shaUow, general datioase such as WordNet does not seem to 
hl^eS. of evSe un^Szed knowledge to substantially reduce the number of unresolvable quesuonable 
rSTS T?^ IrvSueSwledge that throtUes have quadrants is quite speciaUzed, the notion that wars have 
SSS nouficSg 4r(5Si^^^^ in the relevant ways would requite a more cornplex set of semaiuc 
aSd wSK^i^ to take on a specialized flavor, for example, in the technical domain of airplanes. 
Se SSw3 Ime the complete list of airplane parts that it is reasonable to ^sume the typical 

Sd^lSs TOs would S a long Ust. but the remarkable thing is that as large as the WordNet noun datab^ is. 
i^oSly has ori^a few facts atout ^h technical domain; the result is a large, general, semanuc lexicon that 
dS^SlTSeiv much in S single domain. Thus, a WordNet-style database would have to be much larger, and 
SSISS K more ?f L eyi towards technical coverage, in order to form a basis for the types of semanUc 
reference resolution explored in this work. 

Two possible routes for further work are possible. First, many of the cases where the semantic relations were 
coS^t^^d\K^riab\^ to a simpler treatment; for example. Identifying primary objective with a P^viousb^^ 
SoiSowS o/rAe war could be done with an extension to die current simple reference resolution I?roce^ 
SrcScS Se inappropriate comment asking the writer to check whether tins is the mtended meaning Thus 
Ee r^dtesuggesrsome Ssion to tiie cunent no-semantics approach in CCS. In fact, tiiese results imply tiiat 
SiS iSent to t?y a no-semantics approach was a good one, given Uiat most of tite passage references could 
bflS^wWe origSd simple resolution process, and of those references Uiat couldn't, a large semanuc 
database was not very useful. 

A second approach would be more scientifically interesting. The lexicon in technical domains could be 
ch^rteed more systematically, and the semantic databases for technical domams could be deyelop«l. This could 
SSy K buTSrSteresting possibility would be to develop natural-language processing software capable 
S JrSSSefinitions in a tecTidcal glossary to construct a WprdNet-styfe list of smiple semaitic re to 
automatically. The mechanisms in CCS that can parse many technical sentences and resolve references might make 
good foundations fOT such software. 
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